16 research outputs found

    Segtor: Rapid Annotation of Genomic Coordinates and Single Nucleotide Variations Using Segment Trees

    Get PDF
    Various research projects often involve determining the relative position of genomic coordinates, intervals, single nucleotide variations (SNVs), insertions, deletions and translocations with respect to genes and their potential impact on protein translation. Due to the tremendous increase in throughput brought by the use of next-generation sequencing, investigators are routinely faced with the need to annotate very large datasets. We present Segtor, a tool to annotate large sets of genomic coordinates, intervals, SNVs, indels and translocations. Our tool uses segment trees built using the start and end coordinates of the genomic features the user wishes to use instead of storing them in a database management system. The software also produces annotation statistics to allow users to visualize how many coordinates were found within various portions of genes. Our system currently can be made to work with any species available on the UCSC Genome Browser. Segtor is a suitable tool for groups, especially those with limited access to programmers or with interest to analyze large amounts of individual genomes, who wish to determine the relative position of very large sets of mapped reads and subsequently annotate observed mutations between the reads and the reference. Segtor (http://lbbc.inca.gov.br/segtor/) is an open-source tool that can be freely downloaded for non-profit use. We also provide a web interface for testing purposes

    In silico identification of essential proteins in Corynebacterium pseudotuberculosis based on protein-protein interaction networks

    Get PDF
    Background Corynebacterium pseudotuberculosis (Cp) is a gram-positive bacterium that is classified into equi and ovis serovars. The serovar ovis is the etiological agent of caseous lymphadenitis, a chronic infection affecting sheep and goats, causing economic losses due to carcass condemnation and decreased production of meat, wool, and milk. Current diagnosis or treatment protocols are not fully effective and, thus, require further research of Cp pathogenesis. Results Here, we mapped known protein-protein interactions (PPI) from various species to nine Cp strains to reconstruct parts of the potential Cp interactome and to identify potentially essential proteins serving as putative drug targets. On average, we predict 16,669 interactions for each of the nine strains (with 15,495 interactions shared among all strains). An in silico sanity check suggests that the potential networks were not formed by spurious interactions but have a strong biological bias. With the inferred Cp networks we identify 181 essential proteins, among which 41 are non-host homologous. Conclusions The list of candidate interactions of the Cp strains lay the basis for developing novel hypotheses and designing according wet-lab studies. The non-host homologous essential proteins are attractive targets for therapeutic and diagnostic proposes. They allow for searching of small molecule inhibitors of binding interactions enabling modern drug discovery. Overall, the predicted Cp PPI networks form a valuable and versatile tool for researchers interested in Corynebacterium pseudotuberculosis

    Validação de um método para predição de redes de interação proteína-proteína e sua aplicação em Corynebacterium pseudotuberculosis para identificar proteínas essenciais

    No full text
    Exportado OPUSMade available in DSpace on 2019-08-09T14:53:14Z (GMT). No. of bitstreams: 1 tese___edson_luiz_folador.pdf: 49087588 bytes, checksum: 6877a0c0472cfbda45bc1b66a48cd0b2 (MD5) Previous issue date: 16Corynebacterium pseudotuberculosis (Cp) pertence ao grupo CMNR (Corynebacterium, Mycobacterium, Nocardia, Rhodococcus), é uma bactéria patogênica intracelular facultativa, gram-positiva, possui fimbrias, porém não se move, não forma capsulas e não esporula, apresenta-se nos biovares ovis e equi. O biovar equi infecta equinos e bovinos. O biovar ovis infecta principalmente rebanhos de ovinos e caprinos, sendo o agente etiológico de linfadenite caseosa (LC). Cp é prevalente em diversos países, causando significantes perdas econômicas devido à baixa qualidade de carcaças, queda na produção de carne, lã e leite. Os métodos para diagnóstico e tratamento de LC ainda não são suficientemente eficazes devido Cp apresentar baixa resposta terapêutica e habilidade em persistir no meio ambiente e no hospedeiro, sendo importante entender a biologia deste patógeno a nível sistêmico. Neste aspecto, conhecer as proteínas e suas interações é fundamental para compreender os mecanismos moleculares da célula, sendo as redes de interação proteína-proteína uma boa ferramenta para este tipo de estudo. Visando gerar a rede de interação para Cp, nos preocupamos em validar uma metodologia para a predição de interações com dados experimentais e curados disponíveis publicamente. Como resultado, além de aumentarmos a cobertura da rede, obtivemos uma área sobre a curva (AUC) entre 0,93 e 0,96, cujo ponto de corte de 0,70 representa uma especificidade de 0,95 e a uma sensibilidade de 0,90. Com a metodologia validada, foram geradas as redes de interação para nove linhagens do biovar ovis de Cp, sendo ~99% das interações mapeadas do gênero Corynebacterium e possuindo 15.495 interações conservadas entre as linhagens. Validação quanto ao menor caminho e distribuição do grau de interação sugerem que as redes preditas possuem características de redes biológicas. Adicionalmente, comparamos os valores do Coeficiente de Clusterização, Correlação e R2 contra redes geradas aleatoriamente e submetemos as redes geradas ao teste de normalidade Shapiro-Wilk. Todos os resultados demonstraram que as redes de interação preditas não possuem uma distribuição aleatória, sugerindo que as redes não foram formadas por interações espúrias, existindo uma influência biológica em sua predição. Com as redes validadas, selecionamos os primeiros 15% das proteínas com maior número de interações e identificamos 181 proteínas essenciais. Apenas a proteína DNA repair protein (RecN) não teve homologia com a base de dados de genes essenciais (DEG) e outras três tiveram homologia em apenas um organismo em DEG: Catalase (KatA), Endonuclease III (Nth) e Trigger factor (Tig), sugerindo que podem ser bons alvos para diagnóstico ou desenvolvimento de drogas.Corynebacterium pseudotuberculosis (cp) belongs to the group CMNR (Corynebacterium, Mycobacterium, Nocardia, Rhodococcus), is a gram-positive facultative intracellular pathogenic bacterium, have fimbriae, is non-motile, do not form capsules and not sporulate, is presented in serovar ovis and equi. The serovar equi infects horses and cattle. The serovar ovis mainly infects herds of sheep and goats, and is the etiological agent of caseous lymphadenitis (CLA). Cp is prevalent in many countries, causing significant economic losses due to poor quality carcasses decrease in the production of meat, wool and milk. Methods for diagnosis and treatment of CLA are not yet effective enough due Cp have low therapeutic response and ability to persist in the environment, making it an important organism to be researched and understood the systemic level. In this regard, knowing the proteins and their interactions is crucial to understand the molecular mechanisms of the cell, being protein-protein interaction networks an important tool for this type of study. Aiming to generate the Cp interaction network, we worry about validate a methodology for the prediction of interactions with experimental and cured data publicly available. As a result, in addition to increasing the coverage of the network, we obtained an area under the curve (AUC) between 0.93 and 0.96, representing the cutoff of 0.70 a specificity of 0.95 and a sensitivity 0.90. With the validated methodology, the interaction networks were generated for nine serovar ovis Cp strains, being ~99% of interactions mapped from Corynebacterium gender, possessing 15,495 interactions conserved between strains. The shortest path and the degree interaction distribution analysis suggests the predicted networks have biological characteristics. Additionally, we compared the values of the clustering coefficient, Correlation and R2 against randomly generated networks and submit the networks generated to the Shapiro-Wilk normality test. All results show that the predicted interaction networks do not have a random distribution, suggesting the networks were not formed by spurious interactions, existing biological bias its prediction. With validated network, we selected the first 15% of the proteins with more interactions and we identified 181 essential proteins. Only the protein DNA repair protein (RecN) had no homology against database of essential genes (DEG) and other three had homology in just one DEG organism: Catalase (KatA), Endonuclease III (Nth) and trigger factor (Tig ), suggesting they may be good targets for diagnosis and drug development

    An improved interolog mapping-based computational prediction of protein protein interactions with increased network coverage

    No full text
    Automated and efficient methods that map ortholog interactions from several organisms and public databases (pDB) are needed to identify new interactions in an organism of interest (interolog mapping). When computational methods are applied to predict interactions, it is important that these methods be validated and their efficiency proven. In this study, we compare six Blast+ metrics over three datasets to identify the best metric for protein protein interaction predictions. Using Blast+ to align the protein pairs, the ortholog interactions from DIP were mapped to String, Intact and Psibase pDBs. For each interaction mapped to each pDBs, we retrieved the alignment score, e-value, bitscore, similarity, identity and coverage. We evaluated these Blast+ values, and combinations thereof, with the Receiver Operating Characteristic (ROC) curves and computed the Area Under Curve (AUC). To validate these predictions, we used a subset of the Database of Interacting Proteins (DIP) composed of experimental interactions curated by the International Molecular Exchange (IMEx). The cut-off point for each metric/pDB was computed aiming to identify the best one that separates the true and false predicted interactions. In contrast to other methods that only compute the first Blast hit, we considered the first 20 hits, thus increasing the number of predicted interaction pairs. In addition, we identified the contribution of each individual pDB, as well as their combined contribution to the prediction. The best metric had an AUC of 0.96 for a single pDB and AUC of 0.93 for combined pDBs. Compared to other studies, with a cut-off point of 0.70 representing a specificity of 0.95 and a sensitivity of 0.90 for individual pDB, our method efficiently predicts protein protein interactions.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq

    Functional annotation of hypothetical proteins from the <i>Exiguobacterium antarcticum</i> strain B7 reveals proteins involved in adaptation to extreme environments, including high arsenic resistance

    No full text
    <div><p><i>Exiguobacterium antarcticum</i> strain B7 is a psychrophilic Gram-positive bacterium that possesses enzymes that can be used for several biotechnological applications. However, many proteins from its genome are considered hypothetical proteins (HPs). These functionally unknown proteins may indicate important functions regarding the biological role of this bacterium, and the use of bioinformatics tools can assist in the biological understanding of this organism through functional annotation analysis. Thus, our study aimed to assign functions to proteins previously described as HPs, present in the genome of <i>E</i>. <i>antarcticum</i> B7. We used an extensive <i>in silico</i> workflow combining several bioinformatics tools for function annotation, sub-cellular localization and physicochemical characterization, three-dimensional structure determination, and protein-protein interactions. This genome contains 2772 genes, of which 765 CDS were annotated as HPs. The amino acid sequences of all HPs were submitted to our workflow and we successfully attributed function to 132 HPs. We identified 11 proteins that play important roles in the mechanisms of adaptation to adverse environments, such as flagellar biosynthesis, biofilm formation, carotenoids biosynthesis, and others. In addition, three predicted HPs are possibly related to arsenic tolerance. Through an <i>in vitro</i> assay, we verified that <i>E</i>. <i>antarcticum</i> B7 can grow at high concentrations of this metal. The approach used was important to precisely assign function to proteins from diverse classes and to infer relationships with proteins with functions already described in the literature. This approach aims to produce a better understanding of the mechanism by which this bacterium adapts to extreme environments and to the finding of targets with biotechnological interest.</p></div

    Molecular characterization of the hexose transporter gene in benznidazole resistant and susceptible populations of Trypanosoma cruzi.

    No full text
    Submitted by Nuzia Santos ([email protected]) on 2014-06-27T15:55:19Z No. of bitstreams: 1 Molecular characterization of the hexose transporter gene in benznidazole resistant and susceptible populations of Trypanosoma cruzi.pdf: 3191498 bytes, checksum: 460444e66a425185e6e0d1f7c3fcd48e (MD5)Made available in DSpace on 2014-06-27T15:55:19Z (GMT). No. of bitstreams: 1 Molecular characterization of the hexose transporter gene in benznidazole resistant and susceptible populations of Trypanosoma cruzi.pdf: 3191498 bytes, checksum: 460444e66a425185e6e0d1f7c3fcd48e (MD5) Previous issue date: 2012Fundação Oswaldo Cruz. Centro de Pesquisas René Rachou. Belo Horizonte, MG, BrazilFundação Oswaldo Cruz. Centro de Pesquisas René Rachou. Belo Horizonte, MG, BrazilFundação Oswaldo Cruz. Centro de Pesquisas René Rachou. Belo Horizonte, MG, BrazilFundação Oswaldo Cruz. Centro de Pesquisas René Rachou. Belo Horizonte, MG, BrazilFundação Oswaldo Cruz. Centro de Pesquisas René Rachou. Belo Horizonte, MG, BrazilFundação Oswaldo Cruz. Centro de Pesquisas René Rachou. Belo Horizonte, MG, BrazilFundação Oswaldo Cruz. Centro de Pesquisas René Rachou. Belo Horizonte, MG, Brazil / Fundação Oswaldo Cruz. Centro de Excelência em Bioinformática. Belo Horizonte, MG, Brazil/ Universidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Belo Horizonte, MG, BrazilFundação Oswaldo Cruz. Centro de Pesquisas René Rachou. Belo Horizonte, MG, BrazilFundação Oswaldo Cruz. Centro de Pesquisas René Rachou. Belo Horizonte, MG, BrazilBackground: Hexose transporters (HT) are membrane proteins involved in the uptake of energy-supplying glucose and other hexoses into the cell. Previous studies employing the Differential Display technique have shown that the transcription level of the HT gene from T. cruzi (TcrHT) is higher in an in vitro-induced benznidazole (BZ)-resistant population of the parasite (17 LER) than in its susceptible counterpart (17 WTS). Methods: In the present study, TcrHT has been characterized in populations and strains of T. cruzi that are resistant or susceptible to BZ. We investigated the copy number and chromosomal location of the gene, the levels of TcrHT mRNA and of TcrHT activity, and the phylogenetic relationship between TcrHT and HTs from other organisms. Results: In silico analyses revealed that 15 sequences of the TcrHT gene are present in the T. cruzi genome, considering both CL Brener haplotypes. Southern blot analyses confirmed that the gene is present as a multicopy tandem array and indicated a nucleotide sequence polymorphism associated to T. cruzi group I or II. Karyotype analyses revealed that TcrHT is located in two chromosomal bands varying in size from 1.85 to 2.6 Mb depending on the strain of T. cruzi. The sequence of amino acids in the HT from T. cruzi is closely related to the HT sequences of Leishmania species according to phylogenetic analysis. Northern blot and quantitative real-time reverse transcriptase polymerase chain reaction analyses revealed that TcrHT transcripts are 2.6-fold higher in the resistant 17 LER population than in the susceptible 17 WTS. Interestingly, the hexose transporter activity was 40% lower in the 17 LER population than in all other T. cruzi samples analyzed. This phenotype was detected only in the in vitro-induced BZ resistant population, but not in the in vivo-selected or naturally BZ resistant T. cruzi samples. Sequencing analysis revealed that the amino acid sequences of the TcrHT from 17WTS and 17LER populations are identical. This result suggests that the difference in glucose transport between 17WTS and 17LER populations is not due to point mutations, but probably due to lower protein expression level. Conclusion: The BZ resistant population 17 LER presents a decrease in glucose uptake in response to drug pressur

    Structural model of protein Eab7_0741, predicted as a Ribosomal silencing factor RsfS.

    No full text
    <p>(A) Three-dimensional model obtained from the Protein Data Bank. (B) Alignment between the RsfS protein structures from <i>B</i>. <i>halodurans</i> and <i>E</i>. <i>antarcticum</i> generated by MODELLER. (C) Ramachandran plot provided by the PROCHECK program for the RsfS protein from <i>E</i>. <i>antarcticum</i> B7.</p
    corecore